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Abstract 



In this paper we present a formalization of the cen- 
tering approach to modeling attentional structure 
in discourse and use it as the basis for an algorithm 
to track discourse context and bind pronouns. As 



described in [GJW86|, the process of centering at- 
tention on entities in the discourse gives rise to the 
intersentential transitional states of continuing, re- 
taining and shifting. We propose an extension to 
these states which handles some additional cases of 
multiple ambiguous pronouns. The algorithm has 
been implemented in an HPSG natural language 
system which serves as the interface to a database 
query application. 



1 Introduction 



In the ap proac h to dis course structure developed 
m ^id83| and [|GJW86| , a discourse exhibits both 
global and local coherence. On this view, a key el- 
ement of local coherence is centering, a system of 
rules and constraints that govern the relationship 
between what the discourse is about and some of 
the linguistic choices made by the discourse partic- 
ipants, e.g. choice of grammatical function, syn- 
tactic structure, and type of referring expression 
(proper noun, definite or indefinite description, re- 
flexive or personal pronoun, etc.). Pronominaliza- 
tion in particular serves to focus attention on what 
is being talked about; inappropriate use or failure 
to use pronouns causes communication to be less 
fluent. For instance, it takes longer for hearers to 
process a pronominalized noun phrase that is not 



in focus than one that is, while it takes longer to 
process a non-pronominalized noun phrase that is 
in focus than one that is not jGuiSSj . 



The [ GJW86 | centering model is based on the 
following assumptions. A discourse segment con- 
sists of a sequence of utterances Ui, . . . , Um- With 
each utterance [/„ is associated a list of forward- 
looking centers, Cf(Un), consisting of those dis- 
course entities that are directly realized or realizeJ^ 
by linguistic expressions in the utterance. Rank- 
ing of an entity on this list corresponds roughly 
to the likelihood that it will be the primary fo- 
cus of subsequent discourse; the first entity on this 
list is the preferred center, Cp{Un)- Un actually 
centers, or is "about", only one entity at a time, 
the backward-looking center, C5([/„). The back- 
ward center is a confirmation of an entity that has 
already been introduced into the discourse; more 
specifically, it must be realized in the immediately 
preceding utterance, C/„-i. There are several dis- 
tinct types of transitions from one utterance to the 
next. The typology of transitions is based on two 
factors: whether or not the center of attention, Cb, 
is the same from J7„-i to Un, and whether or not 
this entity coincides with the preferred center of 
Un. Definitions of these transition types appear in 
figure |l|. 

These transitions describe how utterances are 
linked together in a coherent local segment of dis- 
course. If a speaker has a number of propositions to 



directly realizes c if U is an utterance (of some phrase, 
not necessarily a full clause) for which c is the semantic in- 
terpretation, and U realizes c if either c is an element of the 
situation described by the utterance [/ or c is directly real- 
ized by some subp art of U Realizes is thus a generalization 
of directly realizes|GJW86|. 
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Cb{Un) = C6(;7„_i) Cb{Un) / Cb{U,,-i) 



Cb{U^) = Cp((7„) 


CONTINUING 








SHIFTING 






C6(t/„) ^ Cp(t/„) 


RETAINING 





Figure 1: Transition States 



express, one very simple way to do this coherently 
is to express all the propositions about a given en- 
tity {continuing) before introducing a related entity 
{retaining) and then shifting the center to this new 
entity. See figure ^. Retaining may be a way to 
signal an intention to shift. While we do not claim 
that speakers really behave in such an orderly fash- 
ion, an algorithm that expects this kind of behavior 
is more successful than those which depend solely 
on recency or parallelism of grammatical function. 
The interaction of centering with global focusing 
mechanisms and with other factors such as inten- 
tional structure, semantic selectional restrictions, 
verb tense and aspect, modality, intonation and 
pitch accent are topics for further research. 

Note that these transitions are more specific than 
focus movement as described in [sidSS . The ex- 



tension we propose makes them more specific still. 
Note also that the Ch of |GJW86| corresponds 
roughly to Sidner's discourse focus and the C f to 
her potential foci. 

The formal system of constraints and rules 
for cente ring, as we have interpreted them from 
|GJW86i , are as follows. For each C/„ in 

Ul, ■ ■ ■ , Urn'- 



• CONSTRAINTS 

1. There is precisely one Cb. 

2. Every element of Cf{Un) must be real- 
ized in Un- 

3. Cb{Un) is the highest-ranked element of 
C/(C/„_i) that is realized in [/„. 



• RULES 

1. If some element oi Cf{Un-i) is realized 
as a pronoun in Un, then so is Cb{Un)- 

2. Continuing is preferred over retaining 
which is preferred over shifting. 

As is evident in constraint 3, ranking of the items 
on the forward center list, C/, is crucial. We rank 
the items in C f by obliqueness of grammatical re- 
lation of the subcategorized functions of the main 
verb: that is, first the subject, object, and object2, 
followed by other subcategorized functions, and fi- 
nally, adjuncts. This captures the idea in | GJW86| | 
that subjecthood contributes strongly to the prior- 
ity of an item on the Cf list. 

We are aware that this ranking usually coincides 
with surface constituent order in English. It would 
be of interest to examine data from languages with 
relatively freer constituent order (e.g. German) to 
determine the influence of constituent order upon 
centering when the grammatical functions are held 
constant. In addition, languages that provide an 
identifiable topic function (e.g. Japanese) suggest 
that topic takes precedence over subject. 

The part of the HPSG system that uses the cen- 
tering algorithm for pronoun binding is called the 
pragmatics processor. It interacts with another 
module called the semantics processor, which com- 
putes representations of intrasentential anaphoric 
relations, (among other things). The semantics 
processor has access to information such as the 
surface syntactic structure of the utterance. It 
provides the pragmatics processor with represen- 
tations which include of a set of reference markers. 
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Cb(Un) = Cb(Un-l) Cb(Un) ^ Cb(Un-l) 



Cb(Un) = Cp(Un) 



Cb(Un) ^ Cp(Un) 



CONTINUING 


SHIFTING- 1 


RETAINING 


SHIFTING 



Figure 3: Extended Transition States 



CONTINUING... 

?7„+i: Carl works at HP on the Natural Language 

Project. 
Ch: [POLLARD:Carl] 
C/: ([POLLARD:Carl] [HP:HP] 

[NATLANG:Natural Language Project]) 

CONTINUING... 



He manages Lyn. 
[POLLARD:Carl] 

([P0LLARD:A1] [FRIEDMAN:Lyn]) 



Ch 

Cf 
He = Carl 

CONTINUING... 

Un+3- He promised to get her a raise. 
Cb: [P0LLARD:A1] 
Cf: ([P0LLARD:A2] [FRIEDMAN:A3] 
[RAISErXl]) 
He = Carl, her ~ Lyn 
RETAINING... 



Un+4 

Cb 
Cf 



She doesn't believe him. 
[P0LLARD:A2] 

( [FRIEDMAN: A4] [P0LLARD:A5]) 



She — Lyn, him = Carl 

Figure 2: 

Each reference marker is contraindexed^ with ex- 
pressions with which it cannot co-specify^. Refer- 
ence markers also carry information about agree- 
ment and grammatical function. Each pronom- 
inal reference marker has a unique index from 
Ai , . . . , A„ and is displayed in the figures in the 
form [POLLARD: Al], where POLLARD is the se- 
mantic representation of the co-specifier. For non- 




and I ChoSC | for conditions on coreference. 
3id83| for definition and discussion of co- 
specification! Note that this u se of co-specification is not 



pronominal reference markers the surface string is 
used as the index. Indices for indefinites are gener- 
ated from Xi , . . . , Xn ■ 



Extension 



The constraints proposed by [GJW86| fail in cer- 
tain examples like the following (read with pro- 
nouns destressed): 

Brennan drives an Alfa Romeo. 
She drives too fast. 
Friedman races her on weekends. 
She often beats her. 



This example is characterized by its multiple am- 
biguous pronouns and by the fact that the final 
utterance achieves a shift (see figure ^). A shift 
is inevitable because of constraint 3, which states 
that the C5(C/„) must equal the Cp(C/„_i) (since 
the Cp{Un-i) is directly realized by the subject 
of Urn "Frie dman" ) . However the constraints and 
rules from GJW86 would fail to make a choice 
here between the co-specification possibilities for 
the pronouns in J7„. Given that the transition is 
a shift, there seem to he more and less coherent 
ways to shift. Note that the three items being ex- 
amined in order to characterize the transition be- 
tween each pair of anc/iorsQ are the Cb of C/„_i, 
the Cb of t7„, and the Cp of [/„. By |GJW86| a 
shift occurs whenever successive CVs are not the 
same. This definition of shifting does not consider 



the same as that used in | ^el85| ] 



*An anchor is a < Cb, Cf > pair for an utterance 
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CONTINUING.. 



Un+l 

Ch 
Cf 



Brennan drives an Alfa Romeo. 
[BRENNAN:Brennan] 
([BRENNAN:Brennan] [X2:Alfa Romeo]) 
CONTINUING... 
Un+2- She drives too fast. 
Cb: [BRENNAN:Brennan] 
Cf: ( [BRENNAN: A7]) 
She = Brennan 
RETAINING... 



Friedman races her on weekends. 
[BRENNAN:A7] 

([FRIEDMAN:Friedman] [BRENNAN: A8] 
[WEEKEND:X3]) 
her — Brennan 
SHIFTING- 1... 



Un+3 

Cb 
Cf 



Un+A 

Cb 
Cf 



She often beats her. 

[FRIEDMAN:Friedman] 

( [FRIEDMAN: A9] [BRENNAN:A10]) 



She = Friedman, her = Brennan 



Figure 4: 



whether the Ch of J7„ and the Cp of J7„ are equal. 
It seems that the status of the Cp of J7„ should be 
as important in this case as it is in determining the 
retaining/ chontinuing distinction. 

Therefore, we propose the following extension 
which handles some additional cases containing 
multiple ambiguous pronouns: we have extended 
rule 2 so that there are two kinds of shifts. A tran- 
sition for Un is ranked more highly if C6(C/„) — 
Cp{Un)', this state we call shifting-1 and it repre- 
sents a more coherent way to shift. The preferred 
ranking is continuing >- retaining >- shifting- 
1 >- shifting (see figure^). This extension enables 
us to successfully bind the "she" in the final ut- 
terance of the example in figure ^ to "Friedman." 
The appendix illustrates the application of the al- 
gorithm to figure |[ 



Kameyama |Kam86 



has proposed another ex- 
tension to the [|GJW86| theory - a property-sharing 
constraint which attempts to enforce a parallellism 
between entities in successive utterances. She con- 
siders two properties: SUBJ and WENT. With 
her extension, subject pronouns prefer subject an- 
tecedents and non-subject pronouns prefer non- 
subject antecedents. However, structural paral- 
lelism is a consequence of our ordering the Cf 
list by grammatical function and the preference 



CONTINUING... 

Un+i- Who is Max waiting for? 

Cb: [PLANCK:Max] 

Cf: ([PLANCK:Max]) 
CONTINUING... 
Un+2: He is waiting for Fred. 

Cb: [PLANCK:Max] 

Cf: ([PLANCK:A1] [FLINTSTONE:Fred]) 
He = MaxjCONTINUING... 
Un+3: He invited him to dinner. 
Cb: [PLANCK:A1] 

Cf: ([PLANCK:A2] [FLINTST0NE:A3]) 
He = Max, him = Fred 

Figure 5: 



for continuing over retaining. Furthermore, the 
constraints suggested in [GJW86| succeed in many 
cases without invoking an independent structural 
parallelism constraint, due to the distinction be- 
tween continuing and retaining, which Kameyama 
fails to consider. Her example which we reproduce 
in figure ^ can also be accounted for using the con- 
tinuing/retaining distinction]^. The third utterance 
in this example has two interpretations which are 
both consistent with the centering rules and con- 
straints. Because of rule 2, the interpretation in 
figure ^ is preferred over the one in figure ^. 



Algorithm for centering and 
pronoun binding 



There are three basic phases to this algorithm. 
First the proposed anchors are constructed, then 
they are filtered, and finally, they are classified and 
ranked. The proposed anchors represent all the co- 
specification relationships available for this utter- 
ance. 

Each step is discussed and illustrated in figure 
0. It would be possible to classify and rank the 
proposed anchors before filtering them without any 
other changes to the algorithm. In fact, using this 
strategy one could see if the highest ranked pro- 
posal passed all the filters, or if the next highest 
did, etc. The three filters in the filtering phase 



^It seems that property sharing of IDENT is still neces- 
sary to account for logophoric use of pronouns in Japanese. 
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Figure 7: Algorithm and Example 



CONTINUING... 

Un+i- Who is Max waiting for? 

Cb: [PLANCK:Max] 

Cf: ([PLANCK:Max]) 
CONTINUING... 
Un+2- He is waiting for Fred. 

Cb: [PLANCKiMax] 

Cf: ([PLANCKiAl] [FLINTSTONE:Fred]) 
he = Max 

RETAINING... 

Un+3: He invited him to dinner. 
Cb: [PLANCK:A1] 

Cf: ([FLINTST0NE:A3] [PLANCK:A2]) 
He = Fred, him = Max 

Figure 6: 

may be done in parallel. The example we use to 
illustrate the algorithm is in figure 0. 



might be gotten via inference from the represen- 
tation for "a house" in C/([/„). Thus when the 
proposed anchors are constructed there is no pos- 
sibility of having an infinite number of potential 
C/'s for an utterance of finite length. 

Another question is whether the preference or- 
dering of transitions in constraint 3 should always 
be the same. For some examples, particularly 
where Un contains a single pronoun and Un-i is 
a retention, some informants seem to have a pref- 
erence for shifting, whereas the centering algorithm 
chooses a continuation (see figure ||). Many of 
our informants have no strong preference as to the 
co-specification of the unstressed "She" in Un+4- 
Speakers can avoid ambiguity by stressing a pro- 
noun with respect to its phonological environment. 
A computational system for understanding may 
need to explicitly acknowledge this ambiguity. 



4 Discussion 



4.1 Discussion of the algorithm 

The goal of the current algorithm design was con- 
ceptual clarity rather than efficiency. The hope is 
that the structure provided will allow easy addition 
of further constraints and preferences. It would be 
simple to change the control structure of the algo- 
rithm so that it first proposed all the continuing or 
retaining anchors and then the shifting ones, thus 
avoiding a precomputation of all possible anchors. 



[GJW86| states that a realization may contribute 
more than one entity to the C f{U). This is true in 
cases when a partially specified semantic descrip- 
tion is consistent with more than one interpreta- 
tion. There is no need to enumerate explicitly all 
the possible interpretations when constructing pos- 
sible C/(J7)'s0, as long as the associated seman- 
tic theory allows partially specified interpretations. 
This also holds for entities not directly realized in 
an utterance. On our view, after referring to "a 
house" in [/„, a reference to "the door" in Un+i 



CONTINUING... 

Un+i- Brennan drives an Alfa Romeo. 
Cb: [BRENNANiBrennan] 
Cf: ([BRENNANiBrennan] [ALFAiXl]) 
CONTINUING... 
Un+2'- She drives too fast. 
Cb: [BRENNANrBrennan] 
Cf: ([BRENNAN:A7]) 
She = Brennan 
RETAINING... 



Un+3 

Cb 
Cf 



'Barbara Grosz, personal communication, and [ |G:jW8e| ] 



Friedman races her on weekends. 
[BRENNAN:A7] 
([FRIEDMAN:Friedman] 
[BRENNAN:A8]) 
[WEEKEND:X3]) 
her = Brennan 
CONTINUING... 
Un+i'- She goes to Laguna Seca. 
Cb: [BRENNAN:A8] 
Cf: ([BRENNAN:A9] 

[LAG-SEC:Laguna Seca]) 
She — Brennan?? 

Figure 8: 

A computational system for generation would try 
to plan a retention as a signal of an impending shift, 
so that after a retention, a shift would be preferred 
rather than a continuation. 



5 



4.2 Future Research 

Of course the local approach described here does 
not provide all the necessary information for in- 
terpreting pronouns; constraints are also imposed 
by world knowledge, pragmatics, semantics and 
phonology. 

There are other interesting questions concerning 
the centering algorithm. How should the center- 
ing algorithm interact with an infcrcncing mecha- 
nism? Should it make choices when there is more 
than one proposed anchor with the same ranking? 
In a database query system, how should answers 
be incorporated into the discourse model? How 
does centering interact with a treatment of defi- 
nite/indefinite NP's and quantifiers? 

We are exploring ideas for these and other exten- 
sions to the centering approach for modeling refer- 
ence in local discourse. 
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6 Appendix 

This illustrates the extension in the same detail as 
the example we used in the algorithm. The num- 
bering here corresponds to the numbered steps in 
the algorithm figure \A The example is the last 
utterance from figure ]^ 

EXAMPLE: She often beats her. 

1. CONSTRUCT THE PROPOSED AN- 
CHORS 

(a) ([A9] [AlO]) 



(b) ([A9] [AlO]) 

(c) (([FRIEDMAN:A9] [FRIEDMAN: AlO]) 
( [FRIEDMAN: A9] [BRENNAN:A10]) 
([BRENNAN:A9] [BRENNAN:A10]) 
([BRENNAN:A9] [FRIEDMAN: AlO])) 

(d) ([FRIEDMAN:Friedman] 
[BRENNAN:A8] 
[WEEKEND :X3] NIL) 

(e) There are 16 possible < Cb, Cf > pairs 
for this utterance. 

i. <[FRIEDMAN:Friedman], 
( [FRIEDMAN ;A9] 
[FRIEDMAN:A10])> 

ii. <[FRIEDMAN:Friedman], 

( [FRIEDMAN :A9] [BRENNAN:A10])> 

iii. <[FRIEDMAN:Friedman], 
([BRENNAN:A9] [FRIEDMAN:A10])> 

iv. <[FRIEDMAN:Friedman], 
([BRENNAN:A9] [BRENNAN:A10])> 

V. <[BRENNAN:A8], 
([FRIEDMAN:A9] 
[FRIEDMAN:A10])> 

vi. <[BRENNAN:A8], 

( [FRIEDMAN :A9] [BRENNAN:A10])> 

vii. <[BRENNAN:A8], 
([BRENNAN:A9] [FRIEDMAN:A10])> 

viii. <[BRENNAN:A8], 
([BRENNAN:A9] [BRENNAN:A10])> 

ix. < [WEEKEND :X3], 

( [FRIEDMAN ;A9] 

[FRIEDMAN:A10])> 
X. < [WEEKEND :X3], 

( [FRIEDMAN ;A9] [BRENNAN:A10])> 

xi. < [WEEKEND :X3], 
([BRENNAN:A9] [FRIEDMAN:A10])> 

xii. < [WEEKEND :X3], 
([BRENNAN:A9] [BRENNAN:A10])> 

xiii. <NIL, 

( [FRIEDMAN :A9] 
[FRIEDMAN:A10])> 

xiv. <NIL, 

( [FRIEDMAN ;A9] [BRENNAN:A10])> 
XV. <NIL, 

([BRENNAN:A9] [FRIEDMAN:A10])> 
xvi. <NIL, 

([BRENNAN:A9] [BRENNAN:A10])> 

2. FILTER THE PROPOSED ANCHORS 
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(a) Filter by contraindices. Anchors i, iv, v, 
via, ix, xii, xiii, xvi are eliminated since 
[A9] and [AlO] are contraindexed. 

(b) Constraint 3 filter eliminates proposed 
anchors vii, ix through xvi. 

(c) Rule 1 filter eliminates proposed anchors 
ix through xvi. 

3. CLASSIFY and RANK 

(a) After filtering there are only two anchors 
left. 

ii: <[FRIEDMAN:Friedman], 

([FRIEDMAN:A9] [BRENNAN:A10])> 
Hi: <[FRIEDMAN:Friodman], 

([BRENNAN:A9] [FRIEDMAN:A10])> 
Anchor ii is classified as shifting- 1 
whereas anchor Hi is classified as shifting. 

(b) Anchor ii is more highly ranked. 
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